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Abstract: For the past two decades, single-index model, a special case of projection pursuit regres- 
sion, has proven to be an efficient way of coping with the high dimensional problem in nonparamet- 
ric regression. In this paper, based on weakly dependent sample, we investigate the single-index 
prediction (SIP) model which is robust against deviation from the single-index model. The single- 
index is identified by the best approximation to the multivariate prediction function of the response 
variable, regardless of whether the prediction function is a genuine single-index function. A poly- 
nomial spline estimator is proposed for the single-index prediction coefficients, and is shown to be 
root-n consistent and asymptotically normal. An iterative optimization routine is used which is 
sufficiently fast for the user to analyze large data of high dimension within seconds. Simulation 
experiments have provided strong evidence that corroborates with the asymptotic theory. Appli- 
cation of the proposed procedure to the rive flow data of Iceland has yielded superior out-of-sample 
rolling forecasts. 

Key words and phrases: B-spline, geometric mixing, knots, nonparametric regression, root-n rate, 
strong consistency. 

1. Introduction 

Let {X?\Yj}™ = {Xi^i, X iy d,Yi}™ =1 be a length n realization of a (d + l)-dimensional 
strictly stationary process following the heteroscedastic model 

Yi = m (Xi) + a (X,) e u m (X*) = E (Y|X 4 ) , (1.1) 

in which i£(e$|Xj) = 0, i?(£?|Xj) = 1, 1 < i < n. The d-variate functions m, a are the 
unknown mean and standard deviation of the response Y conditional on the predictor vector 
Xj, often estimated nonparametrically. In what follows, we let (X T , Y, e) have the stationary 
distribution of CX.f , Yj,£j). When the dimension of X is high, one unavoidable issue is the 
"curse of dimensionality", which refers to the poor convergence rate of nonparametric esti- 
mation of general multivariate function. Much effort has been devoted to the circumventing 
of this difficulty. In the words of Xia, Tong, Li and Zhu (2002), there are essentially two 
approaches: function approximation and dimension reduction. A favorite function approxima- 
tion technique is the generalized additive model advocated by Hastie and Tibshirani (1990), 
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see also, for example, Mammen, Linton and Nielsen (1999), Huang and Yang (2004), Xue 
and Yang (2006 a, b), Wang and Yang (2007). An attractive dimension reduction method is 
the single-index model, similar to the first step of projection pursuit regression, see Friedman 
and Stuetzle (1981), Hall (1989), Huber (1985), Chen (1991). The basic appeal of single-index 
model is its simplicity: the d-variate function m (x) = m (x±, x^,) is expressed as a univariate 
function of x T #o = J2p=i x p^o,p- Over the last two decades, many authors had devised various 
intelligent estimators of the single-index coefficient vector 9q = (0o,i> •••) #o,d) , for instance, 
Powell, Stock and Stoker (1989), Hardle and Stoker (1989), Ichimura (1993), Klein and Spady 
(1993), Hardle, Hall and Ichimura (1993), Horowitz and Hardle (1996), Carroll, Fan, Gijbels 
and Wand (1997), Xia and Li (1999), Hristache, Juditski and Spokoiny (2001). More recently, 
Xia, Tong, Li and Zhu (2002) proposed the minimum average variance estimation (MAVE) for 
several index vectors. 

All the aforementioned methods assume that the d-variate regression function m (x) is 
exactly a univariate function of some x 6q and obtain a root-n consistent estimator of 9$. If 
this model is misspecified (m is not a genuine single-index function), however, a goodness-of-fit 
test then becomes necessary and the estimation of 0q must be redefined, see Xia, Li, Tong 
and Zhang (2004). In this paper, instead of presuming that underlying true function m is 
a single-index function, we estimate a univariate function g that optimally approximates the 
multivariate function m in the sense of 

g(v) = E[m(X)\X T e = v], (1.2) 

where the unknown parameter 9$ is called the SIP coefficient, used for simple interpretation 
once estimated; X T #o is the latent SIP variable; and g is a smooth but unknown function used 
for further data summary, called the link prediction function. Our method therefore is clearly 
interpretable regardless of the goodness-of-fit of the single-index model, making it much more 
relevant in applications. 

We propose estimators of 6q and g based on weakly dependent sample, which includes 
many existing nonparametric time series models, that are (i) computationally expedient and 
(ii) theoretically reliable. Estimation of both 9q and g has been done via the kernel smoothing 
techniques in existing literature, while we use polynomial spline smoothing. The greatest 
advantages of spline smoothing, as pointed out in Huang and Yang (2004), Xue and Yang 
(2006 b) are its simplicity and fast computation. Our proposed procedure involves two stages: 
estimation of 9q by some y^-consistent 9, minimizing an empirical version of the mean squared 
error, R(0) = E{Y — E(Y\ X- T 6)} 2 ; spline smoothing of Y on to obtain a cubic spline 
estimator g of g. The best single-index approximation to m(x) is then m(x) = g (x T 9^ . 

Under geometrically strong mixing condition, strong consistency and y^-rate asymptotic 
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normality of the estimator 9 of the SIP coefficient 9q in (jl.2p are obtained. Proposition 12.21 is 
the key in understanding the efficiency of the proposed estimator. It shows that the derivatives 
of the risk function up to order 2 are uniformly almost surely approximated by their empirical 
versions. 

Practical performance of the SIP estimators is examined via Monte Carlo examples. The 
estimator of the SIP coefficient performs very well for data of both moderate and high dimension 
d, of sample size n from small to large, see Tables CD and O Figures[T]and[2j By taking advantages 
of the spline smoothing and the iterative optimization routines, one reduces the computation 
burden immensely for massive data sets. Table [2] reports the computing time of one simulation 
example on an ordinary PC, which shows that for massive data sets, the SIP method is much 
faster than the MAVE method. For instance, the SIP estimation of a 200-dimensional 9q from 
a data of size 1000 takes on average mere 2.84 seconds, while the MAVE method needs to spend 
2432.56 seconds on average to obtain a comparable estimates. Hence on account of criteria (i) 
and (ii), our method is indeed appealing. Applying the proposed SIP procedure to the rive flow 
data of Iceland, we have obtained superior forecasts, based on a 9-dimensional index selected 
by BIC, see Figure [5j 

The rest of the paper is organized as follows. Section 2 gives details of the model spec- 
ification, proposed methods of estimation and main results. Section 3 describes the actual 
procedure to implement the estimation method. Section 4 reports our findings in an extensive 
simulation study. The proposed SIP model and the estimation procedure are applied in Section 
5 to the rive flow data of Iceland. Most of the technical proofs are contained in the Appendix. 

2. The Method and Main Results 

2.1. Identifiability and definition of the index coefficient 

It is obvious that without constraints, the SIP coefficient vector 9q = ($0,1) •••> @o,d) T is 
identified only up to a constant factor. Typically, one requires that ||#o|| = 1 which entails 
that at least one of the coordinates #0,1, ...,$o,d is nonzero. One could assume without loss of 
generality that 6>o,d > 0, and the candidate #0 would then belong to the upper unit hemisphere 

st 1 = 0,0 1 EU 9 p = > 0}. 

For a fixed 9 = (0 lf ...,9 d f, denote Xg = X T 9, X e>i = Xf9, 1 < i < n. Let 

m e {X e ) = E(Y\X g ) = E{m(X)\X e }. (2.1) 
Define the risk function of 9 as 

R(9) = E \{Y - m e (X e )} 2 ] =E{m (X) - m e (X e )} 2 + Ea 2 (X) , (2.2) 
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which is uniquely minimized at 9q £ <S+ _1 > i-e. 

9q = arg min R (6) . 

Remark 2.1. Note that S^T 1 is not a compact set, so we introduce a cap shape subset of 
ad-i 



■d-1 



(0i,...,^)|^^ = l,^> Vl-c 2 ^,CG (0,1) 
p=i 



Clearly, for an appropriate choice of c, #o £ S%~ , which we assume in the rest of the paper. 

Denote 6L d = (#i, Od-i) 7 \ since for fixed 6 6 the risk function i? (0) depends only 

on the first d — 1 values in 0, so i? (0) is a function of 0_ d 



-R* (#-d) — R (^1,62, —,0d-i, y/l — ll^-dll^ > 



with well-defined score and Hessian matrices 

S " = flfc** <«-«> ' H " = 8^** • (2 - 3) 

Assumption Al: The Hessian matrix H* (0o,-d) positive definite and the risk function R* 
is locally convex at #o,-d; *.e., /or any e > 0, there exists 5 > suc/t i/iai i2* (0-d) — R* (#o,-d) < 
(5 implies \\6-d — 0o,-d|| 2 < e - 

2.2. Variable transformation 

Throughout this paper, we denote by B% = jx E R d | ||x|| < a } the d-dimensional ball with 
radius a and center and 

£i(k) ^B^J = |m the fcth order partial derivatives of m are continuous on -Bf j 

the space of k-th order smooth functions. 

Assumption A2: The density function of X, / (x) G (-£?„), and i/iere are constants 
< Cf < Cf such that 

f C/ /Vol d (5 a rf ) < / (x) < C>/Vol d (i? a d ) , x e B* 
1 /(x) = 0, x££ a d ' 

For a fixed 0, define the transformed variables of the SIP variable Xg 



U e = F d (Xg) , U e ,i = F d (X e ,i) ,l<i<n, 



(2.4) 
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in which is the a rescaled centered Beta{(<i + 1) /2, (d + 1) /2} cumulative distribution 
function, i.e. 

F d (v)= 1^ T ( d+1 \ - (l-ef- 1)/2 dt,ve[-a,a]- (2-5) 

Remark 2.2. For any fixed 9, the transformed variable Ug in ()2.4p has a quasi-uniform [0, 1] 
distribution. Let fg (u) be the probability density function of Ug, then for any u € [0, 1] 

fg(u) = {F' d (v)}fx (v), v = F d 1 (u), 

in which fx 9 (v) = limAi/-*o P {v < Xg < v + Au). Noting that is exactly the projection of 
x on 9, let T> v = {x|z^ < xg < v + A^} n then one has 

P (v < Xg < v + Av) = P(Xe T>y) = [ f (x) dx. 

According to Assumption A2 

tW) <J>l-<JIi<HA,)<W 



Vol<i(Bfl " 1 " '- Vol„(Bfl ' 

On the other hand 

Xo\ d (V u ) = Vo\ d ^(J v )Av + o (Au) , 
where J u = {x\xg = v} n . Note that the volume of £f is ir d / 2 a d /T (d/2 + 1) and 

Vol^ (J v ) = tt^- 1 )/ 2 (a 2 - ^ 2 ) (d - 1)/2 /r{(d+ l)/2} , 



thus 



voLj_i(x) i r(d+i) r ( v 



(d-l)/2 



Volrf(B^) a^{r(^±l)} 2 2 d I 

Therefore < c/ < fg (u) < Cf < oo, for any fixed and u G [0, 1]. 

In terms of the transformed SIP variable Ug in (|2.4p , we can rewrite the regression function 
ms in (|2.ip for fixed 9 



le (Ug) =E{m (X) \Ug} = E {m (X) \X e } = m e (Xg) , (2.6) 
then the risk function R (9) in (|2 . 2[> can be expressed as 



R(9) = E {Y- le (Ug)} 2 =E{m{X)- 1 g(Ug)Y + £<t 2 (X). (2.7) 



2 , r 2 



2.3. Estimation Method 
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Estimation of both 8q and g requires a degree of statistical smoothing, and all estimation 
here is carried out via cubic spline. In the following, we define the estimator 8 of 8q an d the 
estimator g of g. 

To introduce the space of splines, we pre-select an integer n 1 / 6 <C N = N n <C n 1 / 5 (logre)~ 2 / f 
see Assumption A6 below. Divide [0, 1] into (N + 1) subintervals Jj = [tj,tj+i), j = 0, N — 
1, Jn = [tN, 1], where T := {tj}^ =1 is a sequence of equally-spaced points, called interior knots, 
given as 

h-k = ••• = t-i = t = < t\ < ... < tN < 1 = tN+l = ■■■ = tN+k, 

in which tj = jh, j = 0, 1, TV + 1, h = 1/ (N + 1) is the distance between neighboring knots. 
The j-th B-spline of order k for the knot sequence T denoted by Bj^ is recursively defined by 
de Boor (2001). 

Denote by r( fc - 2 ) = r( fe - 2 ) [0, 1] the space of all C^ k 2 ) [0, 1] functions that are polynomials 
of degree k — 1 on each interval. For fixed 8, the cubic spline estimator 70 of 70 and the related 
estimator me of m$ are defined as 

n 

7e (-)=arg min £ {Y t - 7 (U e , t )} 2 , m e (u) = j e {F d (u)} . (2.8) 
Define the empirical risk function of 8 

n n 

R (8) = n- 1 {Yi - le (Ue,i)} 2 = n- 1 ^ {Y t - m e (X e , t )} 2 , (2.9) 
i=i i=i 

then the spline estimator of the SIP coefficient is defined as 

8 = arg min R (8) , 

and the cubic spline estimator of g is mg with 8 replaced by 8, i.e. 

m = {ar g ^ ™ Sm ± { Y > - -< (%) } 2 } « (»)} ■ f 2 - 1 ") 

2.4. Asymptotic results 

Before giving the main theorems, we state some other assumptions. 

Assumption A3: The regression function m G C^ 4 ) (Sf) /or some a > 0. 

Assumption A4: XTie nozse e satisfies E (e |X) = 0, E (e 2 |X) = 1 and there exists a positive 

constant M such that sup £M|e| 3 |X = x) < M. The standard deviation function a (x) is 

xeB d v y 

continuous on B^, 

< c a < inf <7 (x) < sup cr (x) < C a < 00. 
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Assumption A5: There exist positive constants K$ and Ao such that a (n) < K$e~ holds 
for all n, with the a-mixing coefficient for {Zj = (X?\ej)}" defined as 

a(k) = sup \P(BnC)-P(B)P(C)\, k> 1. 

Beo-{Z s ,s<t},C*ecr{Z s ,s>t+A;} 

Assumption A6: T/ie number of interior knots N satisfies: n 1 / 6 <C A?" <C n 1 / 5 (logn) -2 ^ 5 . 

Remark 2.3. Assumptions A3 and A4 are typical in the nonparametric smoothing literature, 
see for instance, Hardle (1990), Fan and Gijbels (1996), Xia, Tong Li and Zhu (2002). By 
the result of Pham (1986), a geometrically ergodic time series is a strongly mixing sequence. 
Therefore, Assumption A5 is suitable for (jl.ip as a time series model under aforementioned 
assumptions. 

We now state our main results in the next two theorems. 

Theorem 1. Under Assumptions A1-A6, one has 

L d ^9 0> „ d ,a.s.. (2.11) 

Proof. Denote by (Q, J 7 , V) the probability space on which all {(Xf, Yj)}^ are defined. By 
Proposition 12.21 given at the end of this section 



sup 



R* {9-d) - R* (d-d) — ►O.o.s.. (2.12) 



So for any 5 > and w G fi, there exists an integer no (w), such that when n > no (lo), 
R* (8 -d,uj) - R* (0 -d) < 5/2. Note that 9_ d = 9-d(u) is the minimizer of R* (6L d ,u), 
so R* (§_ d (uj) ,u?J - R* (Oo-d) < 5/2. Using (|2.12p . there exists ni (u), such that when 

n > n\ (lj), R* (0- d (uj) , u?j — R* (j)~ d (uj) , u?j < 5/2. Thus, when n > max (uq (uj) , n\ (uj)), 



R* \0„ d (uj) ,uj)-R* (0,,,-d) < 5/2 + R* (uj) ,uj) - R* (0 o ,-ci) < 5/2 + 5/2 = 5. 

According to Assumption Al, R* is locally convex at 0o- d , so for any e > and any u>, if 

. d (uj) —0q - d < e for n large enough , which implies 



R* \0- d (u) ,oj)-R* (d ,-d) < 5, then 
the strong consistency. 

Theorem 2. Under Assumptions A1-A6, one has 



yfn~ (d-d-9o,-d) ——>■ N {0, £ (0 O )} , 
where £ (0„) = {H* (9 ^ d )}- 1 ^> (9 ) {H* (9 0i - d )y l , H* %,-d) = and * 

Z M = -2E + 7e 7 P ,g} (Ue )\ + 29 , q 9~ d E [{^d (Ue ) + ie %,d) ( u e )\ 

+29- d E [( 7eo 7 d ) (U 6o )} { (9 2 04 + 9l p ) I {p=q} + 9 0tP 9^ q I {pH} ) 
+29 , p 9 Qd E [{%% + 7eo 7 M } (^o)] - 29 , p 9 0>q 9- 2 d E [{jj + leo ld,d} (M , 
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pq 



% ~ Vo>) (% " } (Ue ) { 7 * (Ug ) - Yf 



in which j p andj Piq are the values of^JOi 
and 75) is given in §2. 



-70 taking at 9 = 9q, for any p, q = 1,2, d— 1 



Remark 2.4. Consider the Generalized Linear Model (GLM): Y = g (X t #q) + " (X) e, where 
g is a known link function. Let 9 be the nonlinear least squared estimator of #0 in GLM. 
Theorem [2] shows that under the assumptions A1-A6, the asymptotic distribution of the 9_ c i 
is the same as that of 9. This implies that our proposed SIP estimator 9_d is as efficient as if 
the true link function g is known. 

The next two propositions play an important role in our proof of the main results. Propo- 
sition 12.11 establishes the uniform convergence rate of the derivatives of 70 up to order 2 to 
those of 70 in 9. Proposition 12.21 shows that the derivatives of the risk function up to order 2 
are uniformly almost surely approximated by their empirical versions. 



Proposition 2.1. Under Assumptions A2-A6, with probability 1 

sup sup 1 70 (it) — 70 (u) I = O I (nhy 1 / 2 log n + h 4 > 

e5 d-l ue[ O,l] L 



sup sup max 

l<p<d g eS c~ 1 l - i - n 



sup sup max 



l<p,q<d d£ sd-l 



Ki<n 



2 



89 p d9 q 



{le{U e ,i)-ie{U 6 ,i)} 



O 



V nh 3 

log n 

V nh 5 



+ h< 



(2.13) 

(2.14) 
(2.15) 



Proposition 2.2. Under Assumptions A2-A6, one has for k = 0, 1, 2 

Qk 



sup 

-d||<vT^ 



R* 



1 -d) 



R* 



o(l), a. s.. 



d k 9_ d 

Proofs of Theorem [21 Propositions 12.11 and 12.21 are given in Appendix. 
3. Implementation 

In this section, we will describe the actual procedure to implement the estimation of #0 
and g. We first introduce some new notation. For fixed 9, write the B-spline matrix as 



B 



and 



Bo (BTB/; 



-1-dT 

(2) 



(3.1) 



as the projection matrix onto the cubic spline space T n e . For any p = 1, d, denote 



B, 



_d_ 

~d9r. 



TV 



SINGLE-INDEX PREDICTION MODEL 



as the first order partial derivatives of Bg and Pg with respect to 9. 
Let S*(9-d) be the score vector of R* (6L d ), i.e. 

d 



S* 



-R* 



The next lemma provides the exact forms of S*(9-d)- 

Lemma 3.1. For the score vector of R* defined in A3.2\) , one has 

S* = -n- 1 \y t P p Y - 9 p 9 d l Y T P d Y} d ~ 1 , 

i j p=i 

where for any p = 1,2, d 

Y T P P Y = 2Y T (I - Pg) B p (bJb,)" 1 BjY, 

r ■ -) n, N 

where B p = \ {B jt3 (U e i ) - B j+lj3 (U e i )} F d (X e ») h l X i>p \ with 
y- J i=l,j=—3 



r(d + i) 



aT{(d + 1) /2} 2 2 d V a 



x 



A \ 2 



Proof. For any p = 1,2, d, the derivatives of B-splines in de Boor (2001) implies 



B, 



_d_ 

89, 



Bja {Uo,i 



n, N 



i=l,j=-3 



d d 



du 



n, N 

i=lJ=-3 



#j,3 {Ug. 



n, JV 



{Bj, 3 (U 0>i ) - Bj + i,3 (C^.i)} F d (X fl)i ) /T 1 ^ | 

J 8=1,7 



i=l,j=-3 
n, N 



J=-3 



Next, note that 



P P = Bp (Bj Bg) Bj + B(, 



_d_ 

d9 n 



BjBfi 



B p {B e Bg) B e + Bg j — (B e B<jJ j B e + Bg (Bg B^) B p . 



Since 



= 



86 p 



B/i Bo + (Bo B,c 



and ^- (BgTB e ) = B^Bg + B?B p , thus 

d_ 

% 



^-(B e B e ) =-(B e B 9 ) (T3 p B e + B e B p J (B e B e ) 



10 



LI WANG AND LIJIAN YANG 



Hence 



P p = (I - P fl ) B p (BjB e ) 1 Bj + Bg(BjB e ) 1 B^ (I 



Thus, (|3.4p follows immediately. 

In practice, the estimation is implemented via the following procedure. 

Step 1. Standardize the predictor vectors {Xj}™ =1 and for each fixed 9 G S^ 1 obtain the 
CDF transformed variables {Ug t i}™ =1 of the SIP variable {Xg^ =l through formula 12. 5\) . where 
the radius a is taken to be the 95% percentile o/ {||Xj||}™ =1 . 

Step 2. Compute quadratic and cubic B-spline basis at each value Ue,i, where the number 
of interior knots N is 



N = min \ c\ 



n l/5.5 



,c 2 }, (3.5) 



Step 3. Find the estimator 9 of 9q by minimizing R* through the port optimization routine 
with (0,0, 1) T as the initial value and the empirical score vector S* in h3.3\) . If d < n, one 
can take the simple LSE (without the intercept) for data {li,Xj}™ =1 with its last coordinate set 
positive. 

Step 4. Obtain the spline estimator g of g by plugging 9 obtained in Step 3 into \2.10\) . 

Remark 3.1. In (|3.5p . c\ and c 2 are positive integers and \y\ denotes the integer part of u. The 
choice of the tuning parameter c\ makes little difference for a large sample and according to our 
asymptotic theory there is no optimal way to set these constants. We recommend using c\ = 1 
to save computing for massive data sets. The first term ensures Assumption A6. The addition 
constrain c 2 can be taken from 5 to 10 for smooth monotonic or smooth unimodel regression 
and c 2 > 10 if has many local minima and maxima, which is very unlikely in application. 

4. Simulations 

In this section, we carry out two simulations to illustrate the finite-sample behavior of our 
SIP estimation method. The number of interior knots is computed according to (|3.5|) with 
c\ = 1, c 2 = 5. All of our codes have been written in R. 

Example 1. Consider the model in Xia, Li, Tong and Zhang (2004) 

Y = m (X) + doe, a = 0.3, 0.5, e L ~ N(0, 1) 
where X = (X u X 2 ) T ~A(0, J 2 ), truncated by [-2.5, 2.5] 2 and 

m (x) = x x + x 2 + 4exp |- (xi + x 2 ) 2 | + 5 {x\ + xl) 1/2 . (4.1) 
If 5 = 0, then the underlying true function m is a single-index function, i.e., m (X) = V2X_ T 9 

o + 

4exp j— 2 (X T #o) 2 j, where 9$ = (1, 1) /y/2. While 5^0, then m is not a genuine single-index 
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function. An impression of the bivariate function m for 5 
Figured] (a) and (b), respectively. 



and 5 = 1 can be gained in 



Table 1: Report of Example 1 (Values out/in parentheses: 5 = 0/5 = 1) 



O"0 


n 


00 


BIAS 


SD 


MSE 


Average MSE 








5e-04 


0.00825 


7e-05 






1 nn 
IUU 




(-0.00236) 


(0.02093) 


(0.00044) 


7e - 05 




#0,2 


-6e - 04 


0.00826 


7e-05 


(0.00043) 


U.O 




(0.00174) 


(0.02083) 


(0.00043) 








-0.00124 


0.00383 


2e-05 






300 


#0,1 


(-0.00129) 


(0.01172) 


(0.00014) 


2e - 05 




#0,2 


-0.00124 


0.00383 


2e - 05 


(0.00014) 






(0.00110) 


(0.01160) 


(0.00013) 










0.00121 


0.01346 


0.00018 






100 


#0,1 


(-0.00137) 


(0.02257) 


(0.00051) 


0.00018 




#0,2 


-0.00147 


0.01349 


0.00018 


(0.00051) 


0.5 




(0.00062) 


(0.02309) 


(0.00052) 








-0.00204 


0.00639 


4e-05 






300 


#0,1 


(-0.00229) 


(0.01205) 


(0.00015) 


4e - 05 




#0,2 


0.00197 


0.00637 


4e-05 


(0.00015) 






(0.00208) 


(0.01190) 


(0.00014) 





For 5 = 0, 1, we draw 100 random realizations of each sample size n = 50, 100, 300 respec- 
tively. To demonstrate how close our SIP estimator is to the true index parameter 6q, Table [T] 
lists the sample mean (MEAN), bias (BIAS), standard deviation (SD), the mean squared error 
(MSE) of the estimates of 9q and the average MSE of both directions. From this table, we find 
that the SIP estimators are very accurate for both cases 5 = and 5 = 1, which shows that 
our proposed method is robust against the deviation from single-index model. As we expected, 
when the sample size increases, the SIP coefficient is more accurately estimated. Moreover, for 
n = 100, 300, the total average is inversely proportional to n. 

Example 2. Consider the heteroscedastic regression model (jl.ip with 



m (X) 



{' 



exp 



IXI 



V3)} 



in which Xj = {A^i, Aj^} T and £j, % = 1, ...,n, are '~" N (0, 1), <7 = 0.2. In our simulation, 
the true parameter 6q = (1, 1, 0, 0, 1)/ v3 for different sample size n and dimension d. The 



5 + exp( \\X\\/Vd 



i.i.d 



(4.2) 
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superior performance of SIP estimators is borne out in comparison with MAVE of Xia, Tong, 
Li and Zhu (2002). We also investigate the behavior of SIP estimators in the previously 
unemployed cases that sample size n is smaller than or equal to d, for instance, n = 100, d = 
100, 200 and n = 200, d = 200, 400. The average MSEs of the d dimensions are listed in Table 
[2J from which we see that the performance of the SIP estimators are quite reasonable and in 
most of the scenarios n < d, the SIP estimators still work astonishingly well where the MAVEs 
become unreliable. For n = 100, d = 10, 50, 100, 200, the estimates of the link prediction 
function g from model (j4.2[) are plotted in Figure which is rather satisfactory even when 
dimension d exceeds the sample size n. 

Theorem Q] indicates that is strongly consistent of #o,-d- To see the convergence, we 
run 100 replications and in each replication, the value of \\6 — 9o\\/y/d is computed. Figure 
[3] plots the kernel density estimations of the 100 \\6 — 8q\\ in Example 2, in which dimension 
d = 10, 50, 100, 200. There are four types of line characteristics which correspond to the two 
sample sizes, the dotted-dashed line (n = 100), dotted line (n = 200), dashed line (500) and 
solid line (n = 1000). As sample sizes increasing, the squared errors are becoming closer to 0, 
with narrower spread out, confirmative to the conclusions of Theorem [TJ 

Lastly, we report the average computing time of Example 2 to generate one sample of size 
n and perform the SIP or MAVE procedure done on the same ordinary Pentium IV PC in 
Table [2j From Table [2j one sees that our proposed SIP estimator is much faster than the 
MAVE. The computing time for MAVE is extremely sensitive to sample size as we expected. 
For very large d, MAVE becomes unstable to the point of the breaking down in four cases. 

5. An application 

In this section we demonstrate the proposed SIP model through the river flow data of 
Jokulsa Eystri River of Iceland, from January 1, 1972 to December 31, 1974. There are 1096 
observations, see Tong (1990). The response variables are the daily river flow (1*), measured in 
meter cubed per second of Jokulsa Eystri River. The exogenous variables are temperature (Xt) 
in degrees Celsius and daily precipitation (Zt) in millimeters collected at the meteorological 
station at Hveravellir. 

This data set was analyzed earlier through threshold autoregressive (TAR) models by 
Tong, Thanoon and Gudmundsson (1985), Tong (1990), and nonlinear additive autoregressive 
(NAARV) models by Chen and Tsay (1993). Figure [4] shows the plots of the three time series, 
from which some nonlinear and non-stationary features of the river flow series are evident. To 
make these series stationary, we remove the trend by a simple quadratic spline regression and 
these trends (dashed lines) are shown in Figure [U By an abuse of notation, we shall continue 
to use X t , Y t , Z t to denote the detrended series. 
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In the analysis, we pre-select all the lagged values in the last 7 days (1 week), i.e., the 
predictor pool is {lt-i, —,Y t - 7l X t ,X t -i, Xt-7, Z t , Z t -i, Z t ^ 7 , }. Using BIC similar to 
Huang and Yang (2004) for our proposed spline SIP model with 3 interior knots, the following 
9 explanatory variables are selected from the above set {Yf_i, lt-4, X t , Xt-\, X t _ 2 , Z t , Z t _i}. 
Based on this selection, we fit the SIP model again and obtain the estimate of the SIP coefficient 
§ = {-0.877,0.382,-0.208,0.125,-0.046, -0.034, 0.004, -0.126, 0.079} T . Figure [5] (a) and (b) 
display the fitted river flow series and the residuals against time. 

Next we examine the forecasting performance of the SIP method. We start with estimating 
the SIP estimator using only observations of the first two years, then we perform the out-of- 
sample rolling forecast of the entire third year. The observed values of the exogenous variables 
are used in the forecast. Figure [5] (c) shows this SIP out-of-sample rolling forecasts. For the 
purpose of comparison, we also try the MAVE method, in which the same predictor vector is 
selected by using BIC. The mean squared prediction error is 60.52 for the SIP model, 61.25 for 
MAVE, 65.62 for NAARX, 66.67 for TAR and 81.99 for the linear regression model, see Chen 
and Tsay (1993). Among the above five models, the SIP model produces the best forecasts. 

6. Conclusion 

In this paper we propose a robust SIP model for stochastic regression under weak depen- 
dence regardless if the underlying function is exactly a single-index function. The proposed 
spline estimator of the index coefficient possesses not only the usual strong consistency and 
y'n-rate asymptotically normal distribution, but also is as efficient as if the true link function 
g is known. By taking advantage of the spline smoothing method and the iterative method, 
the proposed procedure is much faster than the MAVE method. This procedure is especially 
powerful for large sample size n and high dimension d and unlike the MAVE method, the 
performance of the SIP remains satisfying in the case d > n. 
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Appendix 

A.l. Preliminaries 

In this section, we introduce some properties of the B-spline. 

Lemma A.l. There exist constants c > such that for J2f=-k+i a j,k^j,k up to order k = 4 



c/i 1/r ||a|| r < 
c/i 1/r ||a|| p < 



Efc=2 Ej=-fc+l a j,kBj, k 



< (3 r_1 M 1/r Hall,, Kr< oo 



SUEiL-fc+i^i,* <W r H r , 0<r<l 



r ' 

r 



1/r 
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where a := (cc-i^, ao,2, ctN,2, ccn,a)- In particular, under Assumption A2, for any fixed 6 

< Ch 1 / 2 \\a\L. 



ch 1 / 2 \\a\\ 2 < 



4 N 

E E a i.* B i.' 

fe=2 j=-fc+l 



2,6* 



Proof. It follows from the B-spline property on page 96 of de Boor (2001), Ylt=2 
3 on [0, 1]. So the right inequality follows immediate for r = 00. When 1 < r < 
Holder's inequality to find 



N „ - 

■k+1 °3,k - 

r = 00. When 1 < r < 00, we use 



4 N 



E E a 3:k B J,k 
k=2j=-k+l 



< 



E E i«*-*r fi 



\ 1/r / \ 1— 1/r 

: TV \ 7 / 4 AT ' 

E E 

fc=2 j=-k+l 
1/r 



y jM &j,k 



yk=2j=-k+l 

4 N 



= 3l_1/r E E i«i,*r^ 

\fe=2j=-fe+l 

Since all the knots are equally spaced, Bj^ (u) du < h, the right inequality follows from 

4 N 



f 

JO 



E E a 3\kBj,k( 
k=2j=-k+l 



U. 



du < 3 r 1 h \\a.\\ r r 



When r < 1, we have 



4 N 

E E a 3'k B J,k 
k=2j=-k+l 



4 N 

^E E \ a jM r B h- 

k=2j=-k+l 



Since B r - k (u) du < tj+k — tj = kh and 



4 N 
E E a 3,k B j,k(u 
k=2j=-k+l 



du < \\a 



— ll"llr 



/oo 
Bj k (u) du < 3h \\a\\ r r , 
-00 



the right inequality follows in this case as well. For the left inequalities, we derive from Theorem 
5.4.2, DeVore and Lorentz (1993) 



'3 + 1 



for any < r < 00, so 



Ju 



K, fe f < C\h- 1 \ 



N 



E a 3,k B j,k ( 
j=-k+l 



u 



du 



N 



E a 3,k B 3,k (u) 



du. 



j=-k+l 
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Since each u G [0, 1] appears in at most k intervals (tjtj + k), adding up these inequalities, we 
obtain that 



\a\\ r r < dh 



k=i Jt i 



N 


r 


N 


Y a i,kB jt k (u) 


du < ZCh~ l 


Y a 3,k B j,k 


j=-k+i 




j=-k+l 



The left inequality follows. 

For any functions <f> and ip, define the empirical inner product and the empirical norm as 

»i n 

(<f>, f)e= ( u ) V ( n ) fe ( u ) du > U\\l, n ,e = n ^ Y $ ■ 
Jo i=i 

In addition, if functions (j), ip are L2 [0, l]-integrable, define the theoretical inner product and 
its corresponding theoretical L2 norm as 



2 

2fi 



„1 n 

/ 4> 2 («) fe(u)du, ((f), (p) „ e = n^ i y2(p(Ue,i)ip(Ue t i) 

J ° i=l 



Lemma A. 2. Under Assumptions A2, A5 and A6, with probability 1, 



sup max 

f\~- od— 1 h^h =2,3,4 
° ebc l<jJ'<N 



Proof. We only prove the case k = k' = 4, all other cases are similar. Let 

(o,j,j',i = B jA (Ue,i) By A (Ue,i) - EB jA (Ue,i) B fA (U e ,i) , 

with the second moment 

tfCfjj'.i = E [B] A (U e ,i) B), A (U e ,i)] - {EB jA (U e ,i) B fA (U e ,)} 2 , 

where {EB jA [U 0>i ) B fA (U e , t )} 2 ~ N~ 2 , E [b? 4 (L/^) (L/^)] ~ iV" 1 by Assumption A2. 
Hence, EQ ■ i ~ N^ 1 . The fc-th moment is given by 

B \(e,j,f,i\ k = E \Bj A {Ue,i) Bj/ A (Uo,i) — EBj A (U~o,i) By A {Ue,i)\ k 
< {E \B jA (U e ,i) B fA (U e ,i)\ k + \EB jA {U e , % ) B fA {U e ,i)\ k } , 

where \EB jA (U e ,i) B fA (U d:i )\ k ~ N~ k , E \EB jA (U e ,i) B fA (U e ,i)\ k ~ iV _1 . Thus, there exists 
a constant C > such that E \(e,j,j>,i\ k < C2 k ~ l k\E( 2 ., ^ So the Cramer's condition is satisfied 
with Cramer's constant c*. By the Bernstein's inequality (see Bosq (1998), Theorem 1.4, page 
31), we have for k = 3 
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where 



<L = 5 



log n 



n 



at = 2- + 2 1 + 



<5 2 (niV) log 2 n 
25m 2 , + 5c*<5 ra 



, m 2 ~iV-i, 



a 2 (3) = lira 1 + 



5m 



6/7 v 



5rj 



771,3 = max ||G j v JL < ciV 1 / 3 . 
\<i<n 11 ,JU ' lld 



Observe that 5c<5 n = o(l) by Assumption A6, then by taking q such that -q*j > cologra, 
g > cira/logra for some constants co,ci, one has ai = 0(n/q) = O(logn), a 2 (3) = o (n 2 ) via 
Assumption A6 again. Assumption A5 yields that 



n 



6/7 



< < Kq exp ( —A 



q + l_ 

Thus, for fixed 9 G S'c -1 * when n large enough 



g + i 



6/7 



< Cn~ 6Aoco/7 . 



P 



> > < clograexp{-c 2 (5 2 logn} +Cra 2 - 6AoCo/7 . (A.l) 



We divide each range of 9 P , p = 1,2, — 1, into n 6 A d— - 1 ) equally spaced intervals with 
disjoint endpoints —1 = # p o < < ... < 9 Pt M„ = 1, for p = l,...,d — 1. Projecting these 
small cylinders onto S^~ l , the radius of each patch A r , r = 1, ...,M n is bounded by cM" 1 . 

V 1 " 11^ 



Denote the projection of the M n points as 



'r,—di 



>r,-d\\2 



, r = 0,l,...,M n . 



Employing the discretization method, sup max Ke j f J is bounded by 

eeS *-i 1<3,?<N ' J ' J ' 

sup max \Ce r ,j,j',i\ + sup max sup \Co,j,j'i ~ Ce r ,j,j 



0<r<M„ l<j,j'<N 



0<r<M n l<j,j'<N S A r 



(A.2) 



By (jA.ip and Assumption A6, there exists large enough value <5 > such that 



P 



3 ,i 



i=l 



> <5 n J- < ra- 10 , 



which implies that 

oo ( 



n=l 



max 

l<j,j'<N 



n 

ra-^, 

i=l 



> 5n > < 2 ]T iV 2 M n n- 10 < C ]T ra~ 3 < oo. 



71=1 



71=1 



Thus, Borel-Cantelli Lemma entails that 



sup max 

0<r<M n l<3,f<N 



n 



1 ^2&t,3,3',i 



1=1 



O 



logra 
VnN 



, a.s. 



(A.3) 
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Employing Lipschitz continuity of the cubic B-spline, one has with probability 1 



sup max sup 

0<r<M n l<j,j'<N e&A r 



n 



_1 E {tW* ~ tOrjj',*} 



1=1 



O [M^h- 6 ) 



(A.4) 



Therefore Assumption A2, (IA.2|) . (|A,3j) and (IA.4j) lead to the desired result. 

Denote by T = r^urWurt 2 ) the space of all linear, quadratic and cubic spline functions 
on [0, 1] . We establish the uniform rate at which the empirical inner product approximates the 
theoretical inner product for all B-splines Bj k with k = 2, 3, 4. 



Lemma A. 3. Under Assumptions A 2, A 5 and A6, one has 

(7i>72)„, e - (71,72)9 



A, 



sup sup 



O { (nhy 1/2 log n},a.s.. (A.5) 



iV 



Proof. Denote without loss of generality. 

4 AT 

71 = Y Y a JkBj, k , 72 = Y Y 

k=2j=-k+l k=2j=-k+l 

for any two 3 (N + 3)-vectors 

a = (a-1,2, 0:0,2, a7V,2, aw,4) , /? = (/?-i,2, fh,2, ■■■,Pn,2, Pn,4) ■ 
Then for fixed 9 



(7i,72L , 



1 - 

-Y 



i=l 



A? 



2 2 a jjk B jjk (U e ,i) U Y 03,kBj, k {Ue,d 

k=2j=-k+l J [fc=2j=-fc+l 

4 N 

Y Y Y Y a 3^3',h'{Bj,k,B rM >) nfi , 

k=2 j=-k+l k'=2 j'=-k+l 



N 



4 N 4 N 
71 II 2,9 = 2 E E E Q i-fc a i'.fc' { B 3^ B j',k)^ 

k=2j=-k+lk'=2j'=-k+l 
4 N 4 N 

= Y Y Y Y ^h^f,k'(B jtk ,B jfkl/ 

k=2j=-k+lk'=2j'=-k+l 



I l|2 
|72||2 



According to Lemma I A. 11 one has for any 9 E S% 1 , 

2 2 2 2 2 

ci/i||q|| 2 < ||7i|| 20 < c 2 /t||a|| 2 ,c 1 /i||/3|| 2 < \\j2W2e - c 2^ 



2 , 



ci/i || « || 2 ||/3|| 2 < Il7i|| 2 ,e Il72|| 2 ,fl ^ c 2^ ll«l 
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Hence 



sup sup 

9eSc -1 7iS7,72 6r 



<7l,7S} n ,fl - (7i,72>6 



x sup max 

flcQ d-i fc,fc'=2,3,4 
l<j,j'<N 



nih,e Il72ll 2 ,6 

n 



ft 



OO llf llOO 



" ci/t||a|| 2 ||/?|| 2 

1 " f 

i=l K 



A n < cgh sup max 

at-cd-l fc,fc'=2,3.4 

ee5c l<i,/<iV 



1 11 ( 

-Y,\{ B ^ B j',k) 

i=l v 



B i^ B j',k' 



which, together with Lemma lA.21 imply (|A.5|) . 



A. 2. Proof of Proposition 12.11 

For any fixed 6, we write the response Y T = (Y\, ...,Y n ) as the sum of a signal vector 70, 
a parametric noise vector Eg and a systematic noise vector E, i.e., 

Y = jg + Eg + E, 

in which the vectors -yj = {70 (U e ,i) , ...,70 (Ue, n )}, E T = {a (Xi)ei, ...,a (X n ) e n } and Ej = 
{m (Xi) - 70 (C/e,i) , rn (X n ) - 751 (U e<n )}. 

Remark A.l. If m is a genuine single-index function, then Eg = 0, thus the proposed SIP 
model is exactly the single-index model. 

Let 'a be the cubic spline space spanned by (Ug,i)}™ =1 , —3 < j < N for fixed 9. 

(2) 

Projecting Y onto T n g yields that 

le = {le (Ue,i) , ...,70 {Ue,n)} T = Proj r(2) 7e + Proj r(2) Eg + Proj r(2) E, 

71,8 71,8 71,8 

where 70 is given in (j2.8|) . We break the cubic spline estimation error 70 (ug) — 70 (ug) into a 
bias term 70 (ug) — 70 (ug) and two noise terms eg (ug) and eg (ug) 



16 (ue) ~ 70 (ug) = {70 (ug) - >yg (ug)} + eg (ug) + eg (ug) 



where 



70 (u) = {B jA (u)}^ 3 ^ N { (70, B jA ) ne j 
eg (u) = {B jA (u)} T _ 3 ^ N V-} {(E e ,B M ) ne ] 
eg (u) = {B jA (u)}%^ N V-^ {(E,B jA ) ne y. 



N 

j=-3 
N 

i=-3 

V 

-3 ' 



(A.6) 

(A.7) 
(A.8) 
(A.9) 
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In the above, we denote by V nj e the empirical inner product matrix of the cubic B-spline basis 
and similarly, the theoretical inner product matrix as V# 



n 



(A.10) 



In Lemma IA.51 we provide the uniform upper bound of 



and 1 1 V„ 1 1 1 . Before 



that, we first describe a special case of Theorem 13.4.3 in DeVore and Lorentz (1993). 
Lemma A. 4. // a bi-infinite matrix with bandwidth r has a bounded inverse A -1 on I2 and 



k = k (A) := || A| 



-1 



is the condition number of A, then ||A 



< 2cq (1 — v) 1 , with 



c = v 2r || A 1 || 2 , v - 1 /-,- 



1 



l/4r 



K 2 + l 



-l/4r 



Lemma A. 5. Under Assumptions A 2, A 5 and A6, there exist constants < cy < Cy such 
that cyN- 1 || w||| < w T V e w < CyN' 1 || w||| and 



cyN 1 ||w||2 < w T V n qw < CyN 1 1 1 w 1 1 1 , a.s., 



(A.ll) 



with matrices Vq and ~V n fi defined in &A.10\) . In addition, there exists a constant C > such 
that 



sup 



V 



n.t 



<CN,a.s., sup || V, 



-11 



< CN. 



(A.12) 



ees d c 



Proof. First we compute the lower and upper bounds for the eigenvalues of V n 0. Let w be any 
(N + 4)-vector and denote 7 W (it) = Ylf=-3 w jBj,4 (u), then B e w = {7 W (Ue,i) , ...,7 W {Ue, n )} T 
and the definition of A n in (|A.5|) from Lemma IA.3I entails that 

IITw|||,6) (1 - A n ) < w T V nj6 )W = ||7w|l2,n.,6) < ||Tw||l 5 6i ( l + A n) ■ (A. 13) 

Using Theorem 5.4.2 of DeVore and Lorentz (1993) and Assumption A2, one obtains that 



C II l|2 ^ II ||2 Tir 

c fjj IMI2 ^ Il7w|| 2 ,e = w V e w 



N 



<C f -\\w\ {2 . 



(A.14) 



which, together with (|A.13p . yield 



CfCN 



-1 



w 



(1 - A n ) < w 1 V n , e w < C f CN- L \\w\\ z 2 (1 + A n ) 



(A.15) 



Now the order of A n in (|A.5h . together with (|A.14j) and (IA.15j) implies (jA.llj) . in which cy = 
cjC,Cy = CfC. Next, denote by A max (V n ^) and A m i n (V n ,e) the maximum and minimum 
eigenvalue of V n g, simple algebra and (jA.lip entail that 



CyN > ||V n>6 i|| 2 = A max (V ni6 



v; 



X±(V n j>)<cjr 1 N,a.8., 
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thus 



IV 



vr 1 , 



Amax (V n fl) A m ; n (V n ,e) < Cycy < oo, a.s.. 



Meanwhile, let Wj = the (N + 4) -vector with all zeros except the j'-th element being l,j 
—3, ...,N. Then clearly 

1 n 

wJVnjWj = - y~] B^ A (Ue,i) = H-B^H^g , || Wj || 2 = 1, -3 < j < N 

71 8=1 

and in particular 

max (V n,o) 1 1 "W"0 1 1 2 — Amax \^n,6) ! 
W^ 3 V n jW_ 3 > A m i n (V n ,e) || w -3|l2 = ^min (V n ,e) ■ 

This, together with (|A.5P yields that 



k = A max (V n ,e) A min (V n ,e) > 



w^V ni0 w o _ ll-Bo^ll^e \\ B o,4e l ~ A 



1-^-3,4 



\B. 



3,4|| fl 



1 + 



which leads to k > C > l,a.s. because the definition of B-spline and Assumption A2 ensure 

2 2 i j 

that ||-Bo,4|L > Co ||-B-3,4||fl f° r some constant Cq > 1. Next applying Lemma IA.4I with 



V = I K 



1/16 



-1/16 



and cq = f 



l/- 8 




, one gets 








2 





< 2z^ 8 N(l - v) 



-i 



CN,a.s.. Hence part one of (|A.12p follows. 

In the following, we denote by Qt (m) the 4-th order quasi-interpolant of m corresponding 
to the knots T, see equation (4.12), page 146 of DeVore and Lorentz (1993). According to 
Theorem 7.7.4, DeVore and Lorentz (1993), the following lemma holds. 

Lemma A. 6. There exists a constant C > 0, such that for < k < 2 and 7 £ [0, 1] 



(7 - Q T (7)) 



< C 



h 



4-fc 



Lemma A. 7. Under Assumptions A2, A3, A5 and A6, there exists an absolute constant C > 0, 
such that for function jq (u) in jA.7 ) 



sup 



d k 
du 



k (fjfe ~ 7e) 



< c 




00 





/i 4 " fe ,a.s.,0 < k < 2, 



(A.16) 



Proof. According to Theorem A.l of Huang (2003), there exists an absolute constant C > 0, 
such that 



sup H70 - 70II < C sup inf J7-70II oo < C 



6*65? 



71) 



(4) 



/i , a.s., 



(A.17) 
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which proves (|A,16|) for the case k = 0. Applying Lemma I A. 6\ one has for < k < 2 

d k 



sup 



du k 



{Qt (le) - le} 



< C sup 

00 e&st~ 



(4) 

le 



h 4 ~ k < C 



m 



(4) 



K 



4-fc 
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As a consequence of ()A.17p and (|A.18p for the case k = 0, one has 



sup ||Q T (70) - ^ll^ < C 



east' 1 



rn 



(4) 



/l , O.S., 



which, according to the differentiation of B-spline given in de Boor (2001), entails that 



sup 

d-l 



ees, 



d k 
du 



k {Qt (le) - le} 



< c 




00 





/i 4_fc ,a.s., < k < 2. 



(A.19) 



Combining (|ATl8|) and (fATl9|) proves (|A.16P for A: = 1,2. 



Lemma A. 8. Under Assumptions Al, A2, A4 and A5, there exists an absolute constant C > 0, 
such that 



sup sup 



_d_ 



sup sup 

l<p,q<d Qfzgd-1 



d 2 



de p de q 



- le {Ue,i)T i=1 


< c 

00 


rn' 4 ' 


h 3 ,a.s., 

00 


(A.20) 


le {Ue,i)} n i=1 


< c 

00 




h 2 ,a.s.. 

00 


(A.21) 



Proof. According to the definition of 79 in (|A.7p . and the fact that Qt (le) is a cubic spline 
on the knots T 



{{Qt (le) - le} (Ue,i)}?=i = F ° {{Qt (le) - le} (U e ,i)Y l=l , 



which entails that 

m 



Since 



^- {{Qt (le) - le} (UeM=i = ^" P * {{Qt (le) - le} (Ue,)Y i= i 



P P {{Qt (le) - le} (U e ,i)}ti + p e w {{Qt (le) - le} (U e M=x 



- p {{QT(le)-le}(Ue,i)} n l=1 = {{QT 



00 



+ {^{QT(le)-le}(U e ,i)X ip y 



applying ()A.19p to the decomposition above produces (|A.20p , The proof of ()A.2ip is similar. 



22 



LI WANG AND LIJIAN YANG 



Lemma A. 9. Under Assumptions A2, A5 and A6, there exists a constant C > such that 



sup llrt 1 B^|| <Ch,a.s., sup sup 



sup ||Pe|| „ < C, a.s., sup sup 



< C, a.s., 



< G7r\a.s. 



eeSc" 1 i<p<d ees?' 1 

Proof. To prove (|A.22[) . observe that for any vector a S R n , with probability 1 



(A.22) 
(A.23) 



\n 1 Bja|| < llall max 

I V lloo - H Hoc _ 3 < i < iV 



n 



i=l 



< Ch\\a\ 



oo ' 



n _1 B^a 



< a L. max 

oo 00 -3<j<iV 



1 " 



< Cllall 



To prove (jA23jh one only needs to use fA~12l) . fA22l) and (l3~T1) . 

Lemma A. 10. Under Assumptions A2 and A^-A6, one has with probability 1 



sup 



sup sup 











max 


n 


oo 


-3<j<N 




-( 






de p \ 


\ n J 


oo 



sup sup 



BJE 



O 



log n 

log n 
\fnh 



(A.24) 
(A.25) 



Similarly, under Assumptions A2, A4-A6, with probability 1 



sup 



B^E 



sup max 



d-i -3<j<N 



00 ees, 



sup sup 



1 n 

- V S ii4 (Uo ti ) {m (Xf) - 7 e (C^,i)} 
logn 



j=i 
5 /BTE ; 



log n 
VnN 



O 



'nh 



, a.s.. 



(A.26) 
(A.27) 



Proof. We decompose the noise variable e% into a truncated part and a tail part £j = efj" + 
e?2 + where D n = n n (1/3 < v < 2/5), eff = ej {\e t \ > D n }, 



6$=eiI{\Ei\ <D n }-m? n ,m? n =E[e i I{\e i \ < D n } |Xi] . 

It is straightforward to verify that the mean of the truncated part is uniformly bounded by 
D~ 2 , so the boundedness of B spline basis and of the function a 2 entail that 



sup 



1 n 

-^BjAiUe^a^m^ 



i=i 



O {p~ ) =o[n 



-2/3 
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The tail part vanishes almost surely 

oo oo 

^P{|e n |>A t }<^^ 3 < 

n=l 

Borel-Cantelli Lemma implies that 



oo. 



n=l 



i=l 



0[n- k ) , for any k > 0. 



For the truncated part, using Bernstein's inequality and discretization as in Lemma I A, 2 1 



sup sup 

east 1 



n 



i=l 



O (log n/VnN^ ,a.s.. 



Therefore (|A.24p is established as with probability 1 
1. 



sup 

eest 



-bJe 



n 



o n 



2/3 I + O (n- k ) + O (logn/v^iV) = O (logn/v^iV) . 



The proofs of (|A.25p . (|A,26p are similar as E{m(X.i) — jg (JJe,i) We i} — 0, but no truncation 

is needed for (|A.26p as sup max \m (Xj) — 70 (Ug j)| < C < 00. Meanwhile, to prove (|A.27|) . 

9eS d-i l<*<n 

we note that for any p = 1, d 



'BjB e ) = \J2 -K7T [BjA (Ue,i) {m (X,) - l6 (U e>i )}] 



06 



de p 

, i=i ^ 



N 



i=-3 



According to (|2.6p . one has 70 (Ug) = E {m (X) \Uq}, hence 

E [B jA (U e ) {m (X) - le (U e )}} = 0, -3 < j < N,9 e S^ 1 . 
Applying Assumptions A2 and A3, one can differentiate through the expectation, thus 



E 



\d0 p 



B jA (U 9 ) {m (X) - le (Ug)}} \ = 0, 1 < p < d, -3 < j < N, 9 € S d ~\ 



which allows one to apply the Bernstein's inequality to obtain that with probability 1 



f 1 n 8 ) N 

I i=l p ) j=-3 



0\(nh)- l l 2 logn}, 



which is fcPOFf) . 

Lemma A. 11. Under Assumptions A2 and A4-A6, for eg (u) in /lA.Gfy . one has 

sup sup \ig (u)\ = O i (nh)~ 1 ^ 2 logn i , a.s.. 
eeS d-i«e[o,i] 1 J 



(A.28) 
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Proof. Denote a = (a_ 3 , • • • , a N ) T = (BjB ) 1 B^E = (n _1 BjE), then (it) = 

c L jBj,4 (u), so the order of ig(u) is related to that of a. In fact, by Theorem 5.4.2 
in DeVore and Lorentz (1993) 

sup sup |ee(u)| < sup 11^11^ = 



sup 



< CA sup ||n _1 BjE|| ,a.s., 

e&st 1 



where the last inequality follows from (|A.12p of Lemma IA.5I Applying (|A.24|) of Lemma IA. 101 
we have established ()A.28|) . 

Lemma A. 12. Under Assumptions A2 and A4-A6, for eq (u) in iA.8\) . one has 

sup sup \e~q (it) I = O I (n/i) -1 ^ 2 logn > , a.s.. 

e6 5 c d-i«G[0,l] 1 J 

The proof is similar to Lemma IA.1H thus omitted. 
The next result evaluates the uniform size of the noise derivatives. 



(A.29) 



Lemma A. 13. Under Assumptions A2-A6, one has with probability 1 



sup sup max 

i<p<d 6»e5 d_1 1 - i - n 



sup sup max 



wJ 9 



_d_ 



sup sup max 

l<p,q<d geS d-l l<i<n 



sup sup max 



i<p,q<d eeSc -1 
Proof. Note that 



Ki<n 



de p de q 

d 2 



de p de q 



£9 (U e ,i) 



£e (U 9 ,i) 



£0 (Ug 



O^nh 3 )- 1 / 2 logn} , 
O^nh 3 )- 1 ^ 2 logn} , 
O {(n/i 5 )- 1 / 2 logn} , 
: O {(n/i 5 )- 1 / 2 logn} 



(A.30) 
(A.31) 
(A.32) 
(A.33) 



(I-P e )B p (B^B e ) 1 B^E + B e (bJb 8 ) 1 B^ (I - P,,) E. 



i=i 



Applying (TA34j) and CQ51) of Lemma lATTUl (lATHI) of Lemma \EM dA32|) and (lA33"j) of 
Lemma |A.9| one derives (|A.30p , To prove (TOT1) . note that 



_d_ 

dd, 



£0 (U e 



° {P e E e } = PpE e + P*-J-E fl =T X + T 2 , 

OUr, 



(A.34) 



in which 



| (I — Pe) Bp — Be (BjBg) BpBg} (bJb#) B^Ee 



(I — P#) Bp — Bg 



BTBa \ -1 B^B# 1 /BlBfl\ -i BlE 



- 1 xaTi 



n 
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To = Bfi 



n 



d ( Bo E 



By (1X241) . (|AT2]) . fE22|) and (TAT231) . one derives 



sup UTill^ = O (n~ 1/2 iV 3 / 2 logn) , 
while of Lemma [OB flSZElD of Lemma EH 



a.s., 



(A.35) 



sup ||T 2 | 



0e s? 



iV x (V 1/2 /i~ 1/2 logn) = (ra- 1/2 /i" 3/2 logn) ,a.s. 



(A.36) 



Now, putting together (|A.34|) . (|A.35|) and (|A.36|) . we have established (|A.31|) . The proof for 
(|AT32j) and (fAT3"3"|) are similar. 

Proof of Proposition 12.11 According to the decomposition ()A.6|) 

1 70 («) - 70 = I {70 («) - 70 («)} + £0 (it) + ie (u) \ . 

Then (T2~T3"]) follows directly from (TATTBj) of Lemma E3 fA28l) of Lemma EH] and (LA291) of 
Lemma |A. 121 Again by definitions (|A.8[) and ()A.9jl . we write 

d d d d 

-QQ- {(70 - 10) {Ue,i)} = Q0-^9- Te) + ^-7e + 7^0 (Cfl,<) • 

It is clear from (|AT20j) . (|AT30|) and (|X3T|) that with probability 1 



sup sup max 
i^dfles*- 1 l - i - n 



d 

qq- (70 - 70) {Ue,d 



{h?) , 



sup sup max 

l<p<d 6( z S d-i l<i<n 



_d_ 

Wr 



£0 {U e ,ij 



+ 



_d_ 



£0 (^0,i 



l = o{(n/i 3 ) 1/2 log n} . 



Putting together all the above yields (|2,14p . The proof of (|2.15p is similar. 
A. 3. Proof of Proposition I2T21 

Lemma A. 14. Under Assumptions A2-A6, one has 

sup R{9) -R(9) = o(l), a.s.. 

Proof. For the empirical risk function R (9) in (|2,9p . one has 

n 

R (9) = n- 1 {70 (U 9 ,i) ~ m (X,) - a (X;) e 4 } 2 
i=l 
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n 



1 Yl tf* (^.o - ^ (^.o + ^ (^.o - m ( x o - ° ( x o 



i=i 



hence 



i? (0) = n" 1 J] { 7e (C/^) - 7e (U e ,i)} 2 + n' 1 a 2 (X<) ej 
i=i i=i 

+2U" 1 ^ { 7e (Efo.O - le (U 04 )} { le {U e ,i) - m (X,) - a (X*) 



i=l 



^ {7e (U e ,i) - m (X)} 2 + 2n~ l £ { 7e (E/^) - m (X,)} a (X<) e i; 



i=l 



i=i 



where 70 (x) is defined in (|2.8p . Using the expression of R (9) in (|2.7p . one has 



sup 



-12(0) 



< /1 + J 2 + 1 3 + I 4 , 



with 



(i = sup 

east' 1 



h = sup 



1=1 

2n- x £ {70 (C^,i) - 7e (^9,01 {7e (^9,0 - m (X) - a (X*) 



8=1 



1 3 = sup 



9e5? 



n 



1 E ^ (^.0 - m ( X 0! 2 - # {7e (U e ) - m (X)} 1 



i=l 



I4 = SUp 



{ -^X> 2 (XK 2 -^ 2 (x; 

1 /t . 
v i=i 



+ 



{ 7 e (E/0,0 - ^ ( x oi °" ( x o e ? 



i=i 



Bernstein inequality and strong law of large number for a mixing sequence imply that 

h + h = o(l),a.s.. (A.37) 
Now ()2.13p of Proposition 12.11 provides that 



sup 

9eS^- 1 «e[o 



which entail that 



sup \% (u) - j g {u)\ = O (n-^h-^logn + h 4 ) ,a.s. 
h = oUn- 1 ' 2 h- x ' 2 \ogn) 2 + [h^f\ 



,a.s., 



(A.38) 



n 

h<o{{nh)- l / 2 \ogn + h A \x sup 2n~ l V \ le {U e ,i) - m (X) - a (X;) e t 
L J eest 1 7=1 



Hence 



The lemma now follows from (TOT]) . (1X38]) and (|A739l) and Assumpt ion A6. 



h<0( n- 1/2 h- 1/2 log n + h A ) , a.s.. 



(A.39) 
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Lemma A. 15. Under Assumptions A2 - A6, one has 



sup sup 

eeSc -1 l <P< d 



_d_ 

dQ r . 



n 

^R{9)-R{9))-n- 1 Y,ie,i, v =o(n- 1 / 2 ), a.s., 



i=i 



in which 



£e, iiP = 2 { le (U e>i ) - y,} — le (U 0ti ) - j^R (9) , E (&, iiP ) = 0. 



dQr, 



Furthermore for k = 1, 2 

{i? (0) - R (0)} =0 (V 1 / 2 /^ 1 / 2 ^ logn + /i 4 ~ fe ) , a.s.. 



sup 



Qk 



Proof. Note that for any p = 1, 2, d 



5^w=»- 1 E{*("m)- 1 5}^*w.<). 



{ 7 «(^)-m(X)} J-7e(^) 



3 



{ le (U ) - m (X) - a (X) e} — 7e (C/ e ) 



Thus £ (&, iiP ) = 2E { 10 (Ue ti ) - Yi} (U e , 



1 .£) 



J-i?(0) = Oand 



i=i 



with 



9 







2,6,p = n 



i=i 

n 



5 



^3,0, P = n 1 ^ {7e (Efyi) - 7e (E^.i)} ^70 (U e ,j) 



i=i 



Bernstein inequality implies that 



sup sup 



n 



1=1 



O x / 2 logn^ , a.s.. 
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Meanwhile, applying f)2 . 13|) and (|2.14p of Proposition [2~T], one obtains that 

sup sup \J 10p \ = o\{nhy 1/2 logn + h' 1 } xO\(nh 3 y 1/2 logn + h 3 } 

U-2i„„2 



O (n 1 h 2 log 2 n + h 7 ) ,a.s.. 



(A.45) 



Note that 







J 2,e, P = n 1 y~] {70 {Ue,i) - m (Xj) - a (Xj) ej} — (70 - 70) {Ue,j) 



i=l 



n- 1 (E + E e ) i -- {P e (E + E e )} . 



Applying (|2.13p . one gets 



sup sup 

east 1 i<p<^ 



J2,e, P + n- 1 (E + E e f A {P e (E + E e )} 



a.s., 



while (|AT24|) . ([AT26]) and (fA"7L2|) entail that with probability 1 



sup sup 

east 1 i<p<d 



n- 1 (E + Eg f -- {P e (E + E e )} 







thus 



Lastly 



O j (nNy 1/2 log n} x N x JV x O { (niV)" 1/2 log n| = O {n" 1 JV log 2 n} , 



sup sup \J2,e,p\ = O (h? + n 1 N log 2 n) , a.s. 

east" 1 i<p<o! 



(A.46) 



h,e, P ~ n" 1 (le - 70) ^-7e (Efyi) = n~ l (E + E e ) T B e 



8=1 



n 5ft 



11 



'10 ■ 



By applying (fA34j) . (fA3B]) . and (fA"7L2j) . it is clear that with probability 1 



sup sup 

east' 1 i<p<d 



(n- 1 B[E+ n - 1 BjE vf/B;!B 

n 



n d9, 



■10 



= o|(niV) _1/2 logn} x N x O j/i + (nM)' 1/2 log n} 
= o|n _1 log 2 n+ (niV)~ 1/2 logn| , 

while by applying (|A.16P of Lemma IA.71 one has 



sup sup 

east 1 l <v< d 



n 



i=l 



a.s., 



SINGLE-INDEX PREDICTION MODEL 



29 



together, the above entail that 

sup sup | J 3e p \ =o\h 4 + rT 1 log 2 n+(nN)~ 1/2 log n\ ,a.s.. (A.47) 

Therefore, (|X43|) . (fQ5|) . ([AT46]) . fQ7l) and Assumption A6 lead to fA40j) . which, together 
with dX3H) , establish fQ2|) for fe = 1. 

Note that the second order derivative of -R (6) and i? (0) with respect to 6 P , 9 q are 



2n~ 



' n d 2 n d d 

Yl tfo ( u e,i) ~ Y i} m de 7e (^,0 + YI WJ e ^ Ue ^ Bftf 16 ^ Ue ^ 



i=i 



£{ 7e ([/ e )-m(X)} 



a 2 



(9 d 

;»(t r ») <! jjir'" ir ">-jir'" {l " 



The proof of fA42l) for k = 2 follows from (j2TT3|L (I2TT4D and (|2T5jh 

Proof of Proposition 12.21 The result follows from Lemma IA.141 Lemma IA.151 equations 
(TOUl) and fAKTl) . 



A. 4. Proof of the Theorem 2 

Let S* (0-d) be the p-th element of S* {9-d) and for 75 in (|2.6p . denote 



:= 2 1% - #o,p# j^} (^o,») {T0 o (Ue ,i) - Yi} , 



(A.48) 



where j p is value of ^-79 taking at # = #o> for any p, q = 1, 2, d — 1. 



Lemma A. 16. Under Assumptions A2-A6, one has 



sup 

l<p<d-l 



•^p (#0 -d) ~ n 1 Y Vi, P 



i=l 



o n 



,-1/2 



, a.s. 



(A.49) 



Proof. For any p = 1, d — 1 

Therefore, according to fA40j) . fAiTl) and fA48l) 



1 Y ^o^'P ~ Oo>P®o,d n 1 Y &o,i,d> E (Vi,p) = °> 



i=l 



sup 

l<p<d-l 



i=l 



(^0,-rf) - S* {9 -d) ~ n 1 Y Via 



o n 



-1/2 



, a.s. 



Since S 1 * attains its minimum at Oo-d, for p = 1, d — 1 

which yields (|A.49|) . 



=0o 
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Lemma A. 17. The (p,q)-th entry of the Hessian matrix H* (9o,-d) equals l Ptq given in Theo- 
rem^ 



Proof. It is easy to show that for any p, q = 1, 2, d, 

3 R (9) = -i-E {m (X) - 7 , (U e )} 2 = -2E 



09 



p 
2 



dBpdd q 



Note that 



d6 p 
R(9) = -2E 







le (Ue) -QQ-le (Ue) 



8 2 

W le (Ue) W le (U d ) + le (U e ) ^-le (U e ) 



89; 



-R* 



3 m-^Rm 



09 



e d 09 d 



(A.50) 



Thus 



2 



09 p 09 q 



_0_ 

~09 n 



R* 



i) 



2 



09 p 09 q 



R{9) 



9 d 89 p 09 d [ ] 9 d 09 d 09 q [ > 



1 - UMI2 ' &6d 



9 2 09 d Q9 d 



_0_ 

09, 



R* 

p 
2 



-2E 





le (U 9 ) -^-le (Ue) 



+ 29 d 1 6 p E 



le (U e ) -QQ-ie (U e ) 



09 p 09 q 



R* 



2 

-d) = ~ 2E { Qf^o (Ue) ^rle (Ue) + le (U e ) ^rle (U e ) 



p 



09 n 



09 p 09 q 



+20 q 9- 1 E |— 16 (U e ) w ie (Ue) + le (Ue) 



2 



09 n 09, 



-le (Ue) 



+2 



09 n 



1 - P- 



2 

dll2, 



■le (Ue) 



-29 p 9 q 9 d E 





d9 d 



2 

le (Ue) \ + le (Ue) QQ QQ le (Ue) 



Therefore we obtained the desired result. 
Proof of Theorem [2j For any p = 1, 2, d — 1, let 

f p (t) = s; (tL d + (i-t)9 ^ d ) .1 e ;o. 1]. 

then 

j t fv(t) = Y.^;{tL d +(i-t)9^ 



q=l 



-di \ Vq — VQ,q 



(A.51) 
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Note that S* (0-d) attains its minimum at 6-d, i.e., S* ( 9- d ) = 0. Thus, for any p = 1, 2, d- 
1, t p G [0, 1], one has 



£p (#o, 



s: 



s;(e ^ d ) = f p (i)-f p (o) 



R* ( i p 0_d + (1 — t p ) 9o,-d 



T 



q=l,...,d-l 



-d—VQ.-d 



then 



-s* 



1 0,-d) 



& 1 



d6 q de p 



R ( ipQ-d + (1 — tp) 6o-d 



9_ 



d — "0 d 



p,q=l,...,d—l 

Now (I2.1ip of Theorem Q] and Proposition 12.21 with k = 2 imply that uniformly in p, q 
l,2,...,d-l 



d 2 



d6 q de p 



r* (t p e-d+(i-tp)e 0r 



(A.52) 



where l p>q is given in Theorem [5J Noting that y/n (9_d—9o,-d) is represented as 



2 



de q d9 p 



R* (tp8-d + (l-tp)6 ,- l 



p,q=l,...,d—l 



y/nS* (6 -d) , 



r „ i d-1 

where S* (0o,-d) = |<Sp (#o,-d) j an d according to (|A.48|) and Lemma [A. 161 



S;{6 Q ,. d ) = n- l Y j ri P , l + o[n- l l 2 ) ,a.s., E = 0. 



i=l 



Let if (#o) = {i ) pq) va =i ^ e the covariance matrix of y^ra < S* (#o,-d) \ with ^> p9 given in 

I ' J p=i 

Theorem [21 Cramer- Wold device and central limit theorem for a mixing sequences entail that 



'0,-d. 



being the Hessian matrix 



V^5* (9 0l -d) ^N{0,*(9 )}. 

Let £ (0 O ) = {H* {6v,-d)Y l ¥ (0 O ) (^o,-d)} T ] ~\ with H* (l 
defined in f|2.3[) . The above limiting distribution of y/nS* (9q _<j), (|A.52p and Slutsky's theorem 
imply that 

n(L d -0 O - d ) ^N{0,Z(6 )}. 



References 



Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. Springer- Verlag, New 
York. 



32 



LI WANG AND LIJIAN YANG 



Carroll, R., Fan, J., Gijbles, I. and Wand, M. P. (1997). Generalized partially linear single- 
index models. J. Amer. Statist. Assoc. 92 477-489. 

Chen, H. (1991). Estimation of a projection -persuit type regression model. Ann. Statist. 19 
142-157. 

de Boor, C. (2001). A Practical Guide to Splines. Springer- Verlag, New York. 

DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation: Polynomials and 
Splines Approximation. Springer- Verlag, Berlin. 

Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman 
and Hall, London. 

Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. 
Assoc. 76 817-823. 

Hardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, Cam- 
bridge. 

Hardle, W. and Hall, P. and Ichimura, H. (1993). Optimal smoothing in single- index models. 
Ann. Statist. 21 157-178. 

Hardle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the method 
of average derivatives. J. Amer. Statist. Assoc. 84 986-995. 

Hall, P. (1989). On projection pursuit regression. Ann. Statist. 17 573-588. 

Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, 
London. 

Horowitz, J. L. and Hardle, W. (1996). Direct semiparametric estimation of single-index 
models with discrete covariates. J. Amer. Statist. Assoc. 91 1632-1640. 

Hristache, M., Juditski, A. and Spokoiny, V. (2001). Direct estimation of the index coefficients 
in a single- index model. Ann. Statist. 29 595-623. 

Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31 
1600-1635. 

Huang, J. and Yang, L. (2004). Identification of nonlinear additive autoregressive models. J. 
R. Stat. Soc. Ser. B Stat. Methodol. 66 463-477. 



SINGLE-INDEX PREDICTION MODEL 



33 



Huber, P. J. (1985). Projection pursuit (with discussion). Ann. Statist. 13 435-525. 

Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of 
single-index models Journal of Econometrics 58 71-120. 

Klein, R. W. and Spady. R. H. (1993). An efficient semiparametric estimator for binary 
response models. Econometrica 61 387-421. 

Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic properties of 
a backfitting projection algorithm under weak conditions. Ann. Statist. 27 1443-1490. 

Pham, D. T. (1986). The mixing properties of bilinear and generalized random coefficient 
autoregressive models. Stochastic Anal. Appl. 23 291-300. 

Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of index 
coefficients. Econometrica. 57 1403-1430. 

Tong, H. (1990) Nonlinear Time Series: A Dynamical System Approach. Oxford, U.K.: 
Oxford University Press. 

Tong, H., Thanoon, B. and Gudmundsson, G. (1985) Threshold time series modeling of two 
icelandic riverflow systems. Time Series Analysis in Water Resources, ed. K. W. Hipel, 
American Water Research Association. 

Wang, L. and Yang, L. (2007). Spline-backfitted kernel smoothing of nonlinear additive 
autoregression model. Ann. Statist. Forthcoming. 

Xia, Y. and Li, W. K. (1999). On single-index coefficient regression models. J. Amer. Statist. 
Assoc. 94 1275-1285. 

Xia, Y., Li, W. K., Tong, H. and Zhang, D. (2004). A goodness-of-fit test for single-index 
models. Statist. Sinica. 14 1-39. 

Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension 
reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363-410. 

Xue, L. and Yang, L. (2006 a). Estimation of semiparametric additive coefficient model. J. 
Statist. Plann. Inference 136, 2506-2534. 

Xue, L. and Yang, L. (2006 b). Additive coefficient modeling via polynomial spline. Statistica 
Sinica 16 1423-1446. 



Table 2: Report of Example 2 



Sample Size n 


Dimension d 


Average MSE 


Time 


MAVE 


SIP 


MAVE 


SIP 




4 


0.00020 


0.00018 


1.91 


0.19 




10 


0.00031 


0.00043 


2.17 


0.10 


50 


30 
50 


0.00106 
0.00031 


0.00285 
0.00043 


2.77 
3.29 


0.13 
0.10 




100 


0.00681 


0.00620 


5.94 


0.31 




200 


0.00529 


0.00407 


27.90 


0.49 




4 


0.00008 


0.00008 


3.28 


0.09 




10 


0.00012 


0.00017 


3.93 


0.13 


100 


30 
50 


0.00017 
0.00032 


0.00058 
0.00127 


5.41 
8.48 


0.15 
0.16 




100 





0.00395 





0.44 




200 





0.00324 





0.73 




4 


0.00004 


0.00003 


5.32 


0.17 




10 


0.00005 


0.00007 


7.49 


0.24 


200 


30 
50 


0.00006 
0.00007 


0.00017 
0.00030 


10.08 
15.42 


0.26 
0.24 




100 


0.00015 


0.00061 


40.81 


0.54 




200 


— 


0.00197 


— 


1.44 




4 


0.00002 


0.00001 


14.44 


0.76 




10 


0.00002 


0.00003 


24.54 


0.79 




30 


0.00002 


0.00008 


32.51 


0.83 


500 


50 


0.00002 


0.00010 


52.93 


0.89 




100 


0.00003 


0.00012 


143.07 


0.99 




200 


0.00004 


0.00020 


386.80 


1.96 




400 




0.00054 




4.98 




4 


0.00001 


0.00001 


33.57 


1.95 




10 


0.00001 


0.00001 


62.54 


3.64 




30 


0.00001 


0.00002 


92.41 


1.95 


1000 


50 


0.00001 


0.00003 


155.38 


2.72 




100 


0.00001 


0.00005 


275.73 


1.81 




200 


0.00008 


0.00006 


2432.56 


2.84 




400 




0.00010 




9.35 



(a) (b) 




-10 12 -10 12 

(c) (d) 

Figure 1: Example 1. (a) and (b) Plots of the actual surface m in model (|4.ip with re- 
spect to 5 = 0,1; (c) and (d) Plots of various univariate functions with respect to 8 = 0,1: 
|x^, Yj| , 1 < i < 50 (dots); the univariate function g (solid line); the estimated function of 
g by plugging in the true index coefficient 6q (dotted line); the estimated function of g by 
plugging in the estimated index coefficient (dashed line) 9 = (0.69016, 0.72365) T for 5 = and 
(0.72186, 0.69204) T for 5 = 1. 



n=100, d=10 



n=100, d=50 




Figure 2: Example 2. Plots of the spline estimator of g with the estimated index parameter 9 
(dotted curve), cubic spline estimator of g with the true index parameter 9q (dashed curves), 
the true function m (x) in (|4.2p (solid curve), and the data scatter plots (dots). 



Density Estimation, d=10 



Density Estimation, d=50 




Figure 3: Example 2. Kernel density estimators of the 100 \\0 — 9o\\/y/d. 




Figure 4: Time plots of the daily Jokulsa Eystri River data (a) river flow Yt (solid line) with its 
trend (dashed line) (b) temperature X t (solid line) with its trend (dashed line) (c) precipitation 
Zt (solid line) with its trend (dashed line). 




Figure 5: (a) The scatter plot of the river flow ("+") and the fitted plot of the river flow (line) 
and (b) Residuals of the fitted SIP model (c) Out-of-sample rolling forecasts (line) of the river 
flow for the entire third year ("+") based on the first two years' river flow. 



